OcrV1, Main, Exploration, bibRecord, 001606

Video text detection and segmentation for optical character recognition

Identifieur interne : 001606 ( Main/Exploration ); précédent : 001605; suivant : 001607

Video text detection and segmentation for optical character recognition

Auteurs : Chong-Wah Ngo [Hong Kong] ; Chi-Kwong Chan [Hong Kong]

Source :

Multimedia systems [ 0942-4962 ] ; 2004.

RBID : Pascal:05-0469299

Descripteurs français

Pascal (Inist)
- Segmentation, Reconnaissance optique caractère, Technique vidéo, Réduction bruit, Densité élevée, Contraste image, Méthode projection, Apprentissage, Machine vecteur support, Réseau neuronal, Analyse multirésolution, Transformation cosinus discrète, Taux fausse alarme, Détection seuil, Reconnaissance forme, Classification signal, Traitement signal, Extraction caractéristique.

English descriptors

KwdEn :
- Discrete cosine transforms, False alarm rate, Feature extraction, High density, Image contrast, Learning, Multiresolution analysis, Neural network, Noise reduction, Optical character recognition, Pattern recognition, Projection method, Segmentation, Signal classification, Signal processing, Support vector machine, Threshold detection, Video technique.

Abstract

In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.

Affiliations:

Hong Kong

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000432
to stream PascalFrancis, to step Curation: 000355
to stream PascalFrancis, to step Checkpoint: 000447
to stream Main, to step Merge: 001658
to stream Main, to step Curation: 001606

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Video text detection and segmentation for optical character recognition</title>
<author><name sortKey="Ngo, Chong Wah" sort="Ngo, Chong Wah" uniqKey="Ngo C" first="Chong-Wah" last="Ngo">Chong-Wah Ngo</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science, City University of Hong Kong, Tat Chee Avenue</s1>
<s2>Kowloon</s2>
<s3>HKG</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chan, Chi Kwong" sort="Chan, Chi Kwong" uniqKey="Chan C" first="Chi-Kwong" last="Chan">Chi-Kwong Chan</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science, City University of Hong Kong, Tat Chee Avenue</s1>
<s2>Kowloon</s2>
<s3>HKG</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">05-0469299</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 05-0469299 INIST</idno>
<idno type="RBID">Pascal:05-0469299</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000432</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000355</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000447</idno>
<idno type="wicri:doubleKey">0942-4962:2004:Ngo C:video:text:detection</idno>
<idno type="wicri:Area/Main/Merge">001658</idno>
<idno type="wicri:Area/Main/Curation">001606</idno>
<idno type="wicri:Area/Main/Exploration">001606</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Video text detection and segmentation for optical character recognition</title>
<author><name sortKey="Ngo, Chong Wah" sort="Ngo, Chong Wah" uniqKey="Ngo C" first="Chong-Wah" last="Ngo">Chong-Wah Ngo</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science, City University of Hong Kong, Tat Chee Avenue</s1>
<s2>Kowloon</s2>
<s3>HKG</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chan, Chi Kwong" sort="Chan, Chi Kwong" uniqKey="Chan C" first="Chi-Kwong" last="Chan">Chi-Kwong Chan</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science, City University of Hong Kong, Tat Chee Avenue</s1>
<s2>Kowloon</s2>
<s3>HKG</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Hong Kong</country>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Multimedia systems</title>
<title level="j" type="abbreviated">Multimedia syst.</title>
<idno type="ISSN">0942-4962</idno>
<imprint><date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Multimedia systems</title>
<title level="j" type="abbreviated">Multimedia syst.</title>
<idno type="ISSN">0942-4962</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Discrete cosine transforms</term>
<term>False alarm rate</term>
<term>Feature extraction</term>
<term>High density</term>
<term>Image contrast</term>
<term>Learning</term>
<term>Multiresolution analysis</term>
<term>Neural network</term>
<term>Noise reduction</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Projection method</term>
<term>Segmentation</term>
<term>Signal classification</term>
<term>Signal processing</term>
<term>Support vector machine</term>
<term>Threshold detection</term>
<term>Video technique</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Segmentation</term>
<term>Reconnaissance optique caractère</term>
<term>Technique vidéo</term>
<term>Réduction bruit</term>
<term>Densité élevée</term>
<term>Contraste image</term>
<term>Méthode projection</term>
<term>Apprentissage</term>
<term>Machine vecteur support</term>
<term>Réseau neuronal</term>
<term>Analyse multirésolution</term>
<term>Transformation cosinus discrète</term>
<term>Taux fausse alarme</term>
<term>Détection seuil</term>
<term>Reconnaissance forme</term>
<term>Classification signal</term>
<term>Traitement signal</term>
<term>Extraction caractéristique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.</div>
</front>
</TEI>
<affiliations><list><country><li>Hong Kong</li>
</country>
</list>
<tree><country name="Hong Kong"><noRegion><name sortKey="Ngo, Chong Wah" sort="Ngo, Chong Wah" uniqKey="Ngo C" first="Chong-Wah" last="Ngo">Chong-Wah Ngo</name>
</noRegion>
<name sortKey="Chan, Chi Kwong" sort="Chan, Chi Kwong" uniqKey="Chan C" first="Chi-Kwong" last="Chan">Chi-Kwong Chan</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001606 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001606 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:05-0469299
   |texte=   Video text detection and segmentation for optical character recognition
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Video text detection and segmentation for optical character recognition

Video text detection and segmentation for optical character recognition

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri